Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

Targeted Gene Metagenomic Data Analysis ◾ 271

qiime demux emp-single \

--i-seqs artifacts/multiplexed-emp-single-end.qza \

--m-barcodes-file data/sample-metadata.tsv \

--m-barcodes-column barcode-sequence \

--o-per-sample-sequences artifacts/demux-single-end.qza \

--o-error-correction-details artifacts/demux-details.gza

It is also required to provide the sample metadata file as an input “m-barcodes-file” and the

column which includes the barcode sequence in the metadata to be specified. The output is

two artifacts: an artifact for the demultiplexed reads and an artifact for the demultiplexing

details.

Figure 7.3 shows the commands and usages for demultiplexing both file formats.

Once the raw data is imported into QIIME2 artifact and the multiplexed reads were

demultiplexed, all types of raw data will be preprocessed and analyzed in the same way.

Therefore, we will discuss the remaining steps of the analysis with QIIME2 through a

worked example.

7.3.3 Downloading and Preparing the Example Data

As an example, we will download sequence raw data from the NCBI SRA database. The

data is for amplicon-based 16S rRNA gene sequences obtained from NGS for a study to

examine the effect of a yoga-based intervention against a low-FODMAP diet on patients

with irritable bowel syndrome. FODMAP is an acronym for Fermentable Oligosaccharides,

Disaccharides, Monosaccharides, and Polyols, which are short-chain carbohydrates and

poorly absorbed in the small intestine. The metagenomic 16S rRNA gene data sequenced

from fecal samples are available for 86 patients, with irritable bowel syndrome, grouped

into (i) patients who received yoga sessions and (ii) patients who received low-FODMAP

diet. The NCBI BioProject accession for this study is PRJEB24421. We will download the

FASTQ files from the NCBI SRA database and then we will follow through the QIIME2

pipeline to analyze these data step by step. The data are for demultiplexed paired-end

sequences: two FASTQ files (forward and reverse) for each sample.

7.3.3.1 Downloading the Raw Data

To download the files of all experiments, we need to obtain the run accessions of the

experiments in the BioProject. To keep files organized, we will create a directory to store

the raw data of this project. Open the Linux terminal and create a directory with the

BioProject accession, and inside that directory, create a subdirectory with the name “data”

as follows:

mkdir PRJEB24421

cd PRJEB24421

mkdir data

In the next step, we will save the run accessions of the BioProject in a text file in the

“data” subdirectory. To do that, you can open the NCBI SRA database and search for the